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Pq'. ABSTRACT 

^0 ' In this paper, we propose a mathematical framework for automated bug localization. This framework can be 

j/j I briefly summarized as follows. A program execution can be represented as a rooted acyclic directed graph. 

O ■ We define an execution snapshot by a cut-set on the graph. A program state can be regarded as a conjunction 

of labels on edges in a cut-set. Then we argue that a debugging task is a pruning process of the execution 
graph by using cut-sets. A pruning algorithm, i.e., a debugging task, is also presented. 

> 
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, 1 Introduction 

cn 
O 



Algorithmic debugging or automated bug localization techniques have been studied more than two 
decades. Many individual efforts have been published i 
and standardized frameworks have been proposed so fa 
cal framework for automated bug localization technique. 



^ ■ decades. Many individual efforts have been published and implemented, but no comprehensive 

and standardized frameworks have been proposed so far. In this paper, we propose a mathemati- 

^ ■ 

X 

; 2 A framework for bug localization 

Definition 1 (execution graph) An execution graph G — {vq, V, Ed LI Ec) is a rooted acyclic directed 
graph, which represents an instance of an execution of a program. Here, vq, V, Ed, and Ec are a root vertex, a 
set of vertices, a set of data edges, and a set of control edges, respectively. 

The root vertex represents a start point of the program. A vertex in V represents some operation 
during the execution, such as an assignment, unification, sending message, etc. A data edge is labeled 
by information that is carried along with it. Typically, this edge represents a relation between set /use 
events on the same variable and is labeled by a (variable name, value) pair. A control edge specifies 
a relation between a controlling and a controlled vertices. For example, a vertex which represents 
a predicate in ff statement controls other vertices that denote statements in then and else clauses. A 
mapping function on e to its label is denoted as label{e). A control edge is always labeled as "true." 

According to the programming paradigm, a program dependence eraph fFUW87 l , a proof tree, 
and another similar graph representation can be employed as a basis of execution graph. 

In M. Ronsse, K. De Bosschere (eds), proceedings of the Fifth hiternational Workshop on Automated Debugging (AADE- 
BUG 2003), September 2003, Ghent. COmputer Research Repository (http://www.acm.org/corr/), cs.SE/yymmnnn; whole 
proceedings: cs.SE/0309027. 
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Definition 2 (cut-set) In a connected graph G (like an execution graph), a cut-set is a set of edges whose 
removal from G leaves G disconnected. We denote C — (G, Gi, G2} if a cut-set C cuts a graph G into two 
mutually disconnected subgraphs Gi and G2 where G = {(i^i, ^2) £ i?d U i?c | vi G Gi, U2 e G2}. 

Definition 3 (the order of two cut-sets) The order of two cut-sets Ga and Gt is defined as follows. 

Ca d: Gh Gi is a subgraph of GJ and Ga = Gf, G" is identical to G\. 

where Ga — (G, G^f , Gj) and Gb = (G, Gj, Gg). The relation < defines a partial order on cut-sets. 

For any given graph, many cut-sets exist. But only a part of them are allowed for debugging 
purpose because such cut-sets must have two important properties: reproducibility and stoppability 
without any influence to a program execution. These properties make problems especially for paral- 
lel, concurrent, or distributed programs which may have data races or deadlocks. 

Definition 4 (state) For a given cut-set G, we define a state of an execution graph on the G as follows. 

Sc = /\ label{e). 

Intuitively speaking, any program execution can be represented as a data- and control-flow graph 
even if the program doesn't written in a procedural language. A cut-set is a mathematical view of a 
snapshot of the execution. The order of cut-sets, therefore, shows which snapshot precedes on the 
execution. A state means a program state to be examined at that snapshot. 

Definition 5 (debugging) Debugging is a pruning process of an execution graph. It starts when a pro- 
grammer becomes aware one of following phenomena. 

local data anomaly: For some data edge e, label{e) doesn't correspond to that of a programmer's intention. 

local control anomaly: For some control edge e, a programmer concludes the edge shotddn't exist. In other 
words, an operation on a terminal vertex of the edge shouldn't have been executed. 

global anomaly: For a property A, like an assertion, a state Sc on a cutset G violates it. This kind of 
anomaly is well-known as a synchronization error. Deadlock is a typical case. 

And the pruning process is as follows. 

1. On finding a local anomaly, choose a cut-set Ge which includes the edge identified as the 
anomaly. Otherwise, set Ge a cut-set that a programmer has found the global anomaly on it. 

2. Gc ^ { all out-edges of root vertex }. Here, it is obvious that Gc ^ Ge. 

All we have to do is to identify one or more vertices that originate the anomaly. Such vertices 
surely exist between Gc and Gg. Starting with the original execution graph, the following (a kind of 
binary search) process successively prunes subgraphs which never contain culprits of the anomaly. 

3. Choose an appropriate Gt such that Gc -< Gt ^ Gg. If a such cut-set doesn't exist (typically, only 
zero or one vertex exist between Gc and Gt), go to stepIS] 

4. Examine a state Sct on Gf. If the state contains one or more anomalies, Ge ^ Gt- Otherwise, 
Gc ^ Gt. Then go to step|3| 

5. If Gc = Ge (no vertices exist between two cut-sets), it means that some indispensable operations 
are missed at that execution point. Otherwise (it means exactly one vertex remains between two 
cut-sets), there are two types of culprits on e G Ge — Gc- 
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get (n, a) 
t:=l 
s:=atl] 
i:=2 



Ct. 



5 



jph represe 
f a critical ^ 



Ce 



e 

7 
8 
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11 



s>t 

5 mod 2 <> 
put (s) 



s : =s-a [i] 

i:=i+l 

i<=n 



isort ( [X|Xs] ,Ys) ;- isort (Xs, Zs) , insert (X, Zs , Ys ) 
isort([l, []) . 

insert{X, [Y|Ys], [Y|Zs]) :-Y>X, insert (X, Ys , Zs) . 
insert (X, [YIYsj , [X,Y|Ys] ) :- X<=Y. 
insert (X, [], [X] ) . 




executed instructions 



— > data edge (only a part of them are labeled) 
- - 5> control edge 



Figure 1: Interpretation of Shapiro's method. 



Figure 2: Interpretation of Shimo- 
mura's method. 



(a) If e has a local anomaly, an initial vertex of e is the culprit. Maybe an operation at the 
vertex is in the wrong. 

(b) Otherwise, Ce must have a global anomaly. We can find all culprits as: 

i. M <- (j) 

ii. for each a e Sc^ do 

M ^ M U {a} if Sc^-{a} doesn't have the global anomaly. 

All initial vertices of edges in AI are culprits. That is to say, such vertices indicate missing 
critical sections starting at that execution points. 



Shapiro's algorithmic debugging was invented for prolog proerams lSha82l . Fig. shows our in- 
terpretation of his work. From our viewpoint, it uses a proof tree as an execution graph. (Attention: 
This interpretation differs from a normal proof tree. Our interpretation is based on a line graph^ of 
a normal proof tree.) He used only one edge as a cut-set since removal of any edge divides a tree 
into two disconnected subtrees. A state is also simple because only one label, i.e., one unified clause, 
is enough. In this work, step |3| of the pruning process is fully automated and a programmer car- 
ries out step|4]by answering "yes" or "no" to tell a system the correctness of the label on the edge. 
GADTI FGKS91j and Lichtenstein's system[LS89| can be interpreted as the same manner because 
they are straightforward extensions of Shapiro's work. 

FIND has developed for sequential procedural laneuaees lSOC095l . Our interpretation of this work 
is shown in Fig. |2l It uses an execution graph that represents a critical slice, which is an extension of 
d5mamic slice |KL88|. A vertex represents a statement execution and an edge represents some rela- 
tion between two vertices such as set /use relation of a value of some variable or control relation of a 
conditional statement and another statement. FIND uses a traditional breakpoint as a cut-set. A state 
was represented as data- and control-flows across the cut-set, which has ordinary meaning of the 
word state we use for procedural programs. This system carries out step |3] automatically and stepU] 
manually. A programmer examines both data- and control-edges whether they are correct or not on 
a cut-set (breakpoint). 

■^A line graph can be get by interchanging vertices and edges of an original graph. 



3 Related Works 
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FORM AN PAug981 also uses a directed graph representing event trace. It uses two types of edges 
(relations) between events: precedence and inclusion. Compared with our approach, FORMAN has 
an advantage of modeling power of hierarchical objects, such as procedure call, with inclusion edges. 
But it is too simple for an interactive debugging tool because precedence edges only models a normal 
control flow^ . On the other hand, FORMAN lets an event have attributes to represent current pro- 
gram status and other things. So, to represent a program state, FORMAN uses attributes on vertices 
while we use a graph structure (a set of labels on edges), i.e., a cut-set, due to improving interactive 
debugging performance. 

Other Approaches: From our point of view, constraint or assertion based approaches direct to au- 
tomation on step m That is to say, their purpose is to check a state without human effort but using 
predefined predicates, from which might get a specification of a program, hopefully. Knowledge 
based approach aims at finding better Ct to prune an execution tree as large as possible at one time. 
Slicing is a technique to construct an effective execution tree to find faults. Here, a word "effective" 
means that edges of the tree lead programmers to faults as fast as possible without making a detour. 



4 Conclusions 

In this paper, we proposed a mathematical framework for automated bug localization. Based on this 
framework, we are now implementing an assertion-based automated bug localization system for 
distributed programs. It'll be published near future. 
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•'For fairness to FORMAN, it is enough for an off-line event grammar checker. 
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